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Abstract. The weight space of the Ising perceptron in which a set of random 
patterns is stored is examined using the generating function of the partition function 
(j){n) = (l/N) log[Z"] as the dimension of the weight vector N tends to infinity, where 
Z is the partition function and [• • •] represents the configurational average. We utilize 
(f>{n) for two purposes, depending on the value of the ratio a = M/N, where M is the 
number of random patterns. For a < as = 0.833 . . ., we employ 4i{n), in conjunction 
with Parisi's one-step replica symmetry breaking scheme in the limit of n — > 0, to 
evaluate the complexity that characterizes the number of disjoint clusters of weights 
that are compatible with a given set of random patterns, which indicates that, in typical 
cases, the weight space is equally dominated by a single large cluster of exponentially 
many weights and exponentially many small clusters of a single weight. For a > as, 
on the other hand, (p{n) is used to assess the rate function of a small probability that 
a given set of random patterns is atypically separable by the Ising perceptrons. We 
show that the analyticity of the rate function changes at a = acD — 1.245 . . ., which 
implies that the dominant configuration of the atypically separable patterns exhibits a 
phase transition at this critical ratio. Extensive numerical experiments are conducted 
to support the theoretical predictions. 
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1. Introduction 

The generating function (density) with respect to the partition function Z: 

<j>{n) = ^\og[Z^] (nGM), (1) 

plays a key role in research on disordered systems, where denotes the size of the 
objective system and [■ ■ ■] denotes the average over the quenched randomness. Assessing 
(j){n) for Vn G M exactly is, in general, difficult, whereas the analytical evaluation for 
n = 1, 2, . . . G N, in conjunction with the use of the saddle point method as N oo, is 
possible for a class of systems. This indicates that ([T]) can be practically evaluated by 
analytically continuing the expressions of 0(n) evaluated for n G N to n G M, which is 
often referred to as the replica method. 

In most models of statistical mechanics of disordered systems, the probability that 
free energy density, — (1/A^) logZ, will take a certain value /, P{f), can be expressed 
in large deviation statistics as 

P(/)~exp{Ari?(/)}, (2) 

where R{f) < is often referred to as the rate function. One of recent progresses of the 
replica theory is the formation of a link between R{f) and (f){n) [HE], [3]. When R{f) is 
a convex upward function, it can be assessed from (j){n) being parameterized by n G M 

as 

/W^-^. fl(/(„)).,(„)-„M!!). (3) 

This indicates that the typical value of /, which is characterized by the condition 
R{f) = i^/N) logP(/) = 0, can be evaluated as 

/• = -lim^. (4) 

n^o on 

which is sometimes referred to as a replica trick formula. Equation ([3]) indicates that 
n = 1, 2, . . . corresponds to a^?/|?zca/ samples of R{f) < representing a small probabihty. 
This means that the replica trick can be regarded as a formula that infers the behavior 
of typical samples by extrapolating the behavior for atypical samples. 

Another recent advance in the replica theory is the association between the complex 
structure of phase space and a formalism of one-step replica symmetry breaking (IRSB) 
[H [5l E]. In a number of systems that are subject to disordered interactions, the phase 
space is considered to be divisible into exponentially many disjoint sets as A^ oo. 
Each of the disjoint sets is sometimes referred to as a pure state. Let us assume that 
the number of pure states specified by the free energy (density) value /, A/'(/), is scaled 
as 

Ar(/)~exp{ArS(/)}, (5) 

where the exponent S(/) > is referred to as the complexity. Saddle point evaluation of 
^ exp {—Nxf.y}, where x G M is a certain control parameter and 7 and are indices 
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of a pure state and its free energy, respectively, indicates that can be evaluated 

using another generating function g{x) = (l/N) log exp {—Nxf^}^ as 

= E(/W)=,W-.^. (6) 

which is parameterized by x as long as is convex upward. This formalism is defined 
for each sample of quenched randomness and, therefore, has nothing to do with </>(n). 
However, recent studies have revealed that typical g{x) (more precisely, {l/x)g{x)) over 
the quenched randomness can also be assessed from 0(n) by evaluation of (jlj) under 
Parisi's IRSB ansatz, handling the IRSB parameter X clS db control parameter. 

The concepts of the two exponents R{f) and are different in that R{f) < 
represents a small probability of atypical samples, whereas > represents a large 
number of pure states that occur for typical samples. However, the formal similarity of 
([3]) and ([6]) indicates that there might be a relationship between these two exponents. In 
fact, when the IRSB solution of (/>(n), ^irsbI^), is assessed using the replica symmetric 
(RS) solution (/>Rs(n) as (f>iRSB{n) = Extra,{(n/x)</)pi,s(a;)}, where Extr^{- ■ ■} denotes 
the operation of extremization with respect to x, the functional forms of R{f) and 
are in agreement [21 E]. In addition, the model class for which this property 
holds is rather wide, and includes random energy models [3, E] and ]?-body spin glass 
models without external fields [2]. This naturally motivates us to further explore more 
general relationships among R{f), and including cases for which the formal 

accordance of functional forms between R{f) and does not hold. 

As a concrete effort for the exploration, we herein consider Ising perceptrons 
that store random input-output patterns. There are two reasons for considering this 
system. First, the Ising perceptrons can be macroscopically characterized by a few sets 
of order parameters and are much easier to handle than systems of sparse couplings 
P, [ini [m [121 [13], for which several numerical calculations are required. Despite the 
simplicity, this model still could exhibit rich behavior in the phase space involving 
nontrivial RSB phenomena [El [15], which is highly suitable for our purpose. The second 
reason is that the meaning of complexity for the perceptrons of finite size is rather clear. 
For the Ising perceptrons, a pure state at zero temperature can be identified with a stable 
cluster, the definition of which will be given in section [5l with respect to single spin flips 
[T6l [TTI [T8] . For samples of small systems, the size of the clusters can be numerically 
evaluated by exhaustive enumeration without any ambiguity. This property is extremely 
useful for justifying theoretical predictions through numerical experiments. 

The remainder of the present paper is organized as follows. In the next section, 
we introduce the model that considered herein. In section 3, we provide a formalism 
that assesses the complexity and rate function based on the IRSB evaluation of the 
generating function, for the Ising perceptrons. In the formalism, the complexity and 
rate function are defined not for the free energy / but for the entropy s, because the 
analysis is carried out for the micro-canonical ensemble of Ising weights that are perfectly 
compatible with a given set of random patterns. In section 4, we analyze the behavior 
of the weight space of the Ising perceptron using this formalism. It is found that for 
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a = M/N < as = 0.833 . . ., where M is the number of random patterns, the typical 
phase space of the Ising perceptron is characterized by a convex downward complexity 
being equally dominated by a single large cluster of exponentially many weights and 
exponentially many small clusters of a single weight. For a > a^, on the other hand, 
the rate function becomes relevant for the analysis because random patterns that are 
perfectly separable by the Ising weights are generated only atypically in this region. It 
is also found that a certain transition of the rate function occurs at another critical 
ratio ogd = 1.245 .... These predictions are validated by comparison with the results of 
extensive numerical experiments in section [51 The final section is devoted to a summary. 



2. Model definition 

A simple perceptron is a map from to {+1,-1} defined as 

_ / +1, S- x/VN > 0, 
-1, S-x/VN<0, 

where x G is the input pattern and y G {+1, —1} is the output label. The vector S 
denotes the adjustable synaptic weight. We hereinafter focus on the case of Ising weight 
Si G {+1, — !}• In a general scenario, the perceptron stores a given set of M labeled 
patterns 

D^' = {{x^,y,),---,{xM,yM)}, (8) 

by adjusting the weight S so as to completely reproduce the given label for the input 
a;^ for yu = 1, 2, . . . , M. 

In the following, we consider the situation in which the patterns are independently 
and identically distributed samples from 

1 \ ^ / 2 

1 \ / 



P{y) = \(Hy-i) + Hy + i))- (lo) 

The question we address herein is how the space of the weights that store D'^^ is 
characterized macroscopically when pattern ratio a = M/N ~ 0(1) is fixed as M 
and tend to infinity. 



3. Formalism 



3.1. RS and IRSB solutions of the generating function 



As bases of our analysis, we first provide expressions of RS and IRSB solutions of the 
generating function. Since these solutions have been derived numerous times in earlier 
studies [m [15], we present only a sketch of the derivation in the main text, and details 



are shown in Appendix A For readers who are not familiar with the replica method, 
we refer to [HI [20]. 
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We first define the Boltzmann factor ri{S\D^^) of tlie present system as it takes 1 if 
tfie weiglit S is compatible witfi and otlierwis^. Tlie explicit form is expressed 



as 



M 



Sx, 



(11) 



where B(m) = 1 for u > and Q{u) = 0, otherwise. The partition function Z{D^^) = 
'^sVi'^l^^^) is equal to the number of weights that are perfectly compatible with D'^ 
in this situation, and varies randomly depending on the quenched randomness D^^. This 
naturally leads us to evaluate the generating function (f){n) = (l/A^)log [Z^{D^^)~\ 
using the replica method, where [■ ■ -J^m represents the operation of averaging with 
respect to D*^. For n = 1, 2, . . . G N, this yields the following expression: 



0(n) 




+alog 



(12) 



where 



u represents averaging with respect to multivariate Gaussian random variables 
■u", the first and second moments of which are specified as [u"']^ = and 
Sab + (1 - f^afe)?"^ (a, 6 = 1, 2, . . . , n), respectively. 
Analytical continuation from n G N to n G M is performed by imposing a certain 
permutation symmetry on the extremum point of the right-hand side of f|T2l) . We find 
several solutions in the RS and IRSB levels. 



3.1.1. RS solutions Constraints q = q and = g characterize the RS solutions. 
Solving the extremization problem of (fT2|) analytically and numerically under these 
constraints yields the following two solutions: 
RSI: < g < 1 and q< +oo. 




where Dz = exp (— z^/2) / ^/27^ represents the Gaussian measure and E{u) = °° Dz. 
RS2: q = 1 and q = +oo. 

0RS2(ri) = (l-a)log2. (14) 

II This is equivalent to the zero temperature hmit /3 ^ oo of the Boltzmann factor g"''^'"^'-^ ^ where 
the Hamiltonian H{S\D^) is given by J2^=i ® ("2^^'^^)' which is equal to the number of patterns 
that are incompatible with the weight S. 
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3.1.2. IRSB solutions In IRSB solutions, replica indices are divided into n/m groups 
of identical size m. Constraints for characterizing the IRSB solutions are expressed as 

j gi if a and h belong to the same group, , , 

I go otherwise, 

and are similarly expressed for g"^. Three solutions are found under these constraints: 
IRSBl: (gi, go) = (1, g) and (gi, go) = (+00, g), where g and g take the same values as 
those for (p-RSiiji). 

<PmsBi{.n,m) = 0Rgi ( — ) . (16) 

1RSB2: (gi,go) = (g, g) and (gi,go) = (g, g), where g and g take the same values as 
those for 0rsi(^)- 

(piRSB2{n,m) = (f)jisi{n). (17) 

1RSB3: (gi,go) = (1,1) and (gi,go) = (+00, +00). 

0iRSB3("', fn) = 0RS2(n) = (1 - a) log 2. (18) 

In usual analyses, Parisi's IRSB parameter m is determined by the extremum 
condition in evaluating 0(n) = Extim {(pmsB*{n,m)}, where * = 1,2 and 3. In 
addition, there might be no need to classify 1RSB2 and 1RSB3 as IRSB solutions 
because 1RSB2 and 1RSB3 are completely reduced to RSI and RS2, respectively. 
However, handling these three solutions as IRSB solutions, leaving the m-dependence of 
0irsb('^, m) explicitly, is crucial for the current purpose of relating the concepts of 0(n), 
S(s), and R{s) based on physical considerations presented in the following subsection. 

3.2. IRSB solution as the generating function of complexity and rate function 

Let us present the IRSB solutions through a physical inference based on arguments 
presented in earlier studies |5l [6], |12|, [161 [II]- Here, we assume a situation in which the 
weight space is divided into exponentially many pure states for a given sample of . 

We introduce an indicator function S^{S), which is defined as 6^{S) = 1, if iS 
belongs to the pure state 7 and 0, otherwise, to express the number of weights included 
in 7 as 



Let us assume that typically scales as ~ exp(A^s), where s ~ 0(1) has the 
physical meaning of entropy (density), and the number of pure states corresponding 
to the value of the entropy s increases as A/'(s) ~ exp(A^Il(s)), where S(s) > is the 
complexity for the entropy s. This assumption, in conjunction with the saddle point 
assessment, provides us with a generating function of S(s), as follows: 
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max {xs + . 



(20) 



This relationship indicates that when S(s) is a convex upward function, it can be 
assessed from g{x\D^'^) as 



s(x) 



dg{x\D 



S(s(x)) = g{x\D^')-x 



dg{x\D 



(21) 



dx dx 
being parameterized by x. Here, g{x\D^^) is defined for each sample of D^^ . However, 
the self- averaging property is assumed to hold in the current system, which means that 
g{x\D^) for typical samples converges to its average g{x) = [g{x\D'^)~\ ^J^J in a large 
system limit of N,M ^ oo while maintaining a = M/N ~ 0(1). 

The replica method can be used to assess g{x). For this, we consider the following 
identity: 



9i^) = ^ 



log V^:; 




.(22) 



is difficult, for x,y E N, the 



Although exact evaluation of the right-hand side of 
equation (flQll and the formula of series expansion provide the following expression: 



V 7 > 

EE 



M y 



nnne 

^=1 (7=1 a=l 



X, 



N 



(23) 



DM 



(T=l a=l 



which can be evaluated by the saddle point method in the large system limit. 
The following observations are noteworthy in the evaluation. 

• The summation is taken over all possible configurations of xy replica weights. 

• However, the factor of 11^=1 na=i '^7'^('^'^'^) allows only contributions from 
configurations in which xy replica weights are equally assigned to y pure states 
by X. 

These observations are nothing more than the physical meaning of the IRSB ansatz in 
assessmg [Z"(D^^)]^^, with substitution of n = xy and m = x (figure [1]). Accepting 
this interpretation yields the following expression: 



— log 



0iRSB(a;i/,x) 



(24) 



DM 



where (/>irsb('t-, w^) is the IRSB solution considered in the previous section, and its 
concrete functional form should be chosen appropriately from among IRSBl, 1RSB2, 
and 1RSB3 for a given pair of a and n. 

Inserting (12^ into (l22|l yields g{x) = x{d/dn)(j)ijisB{n',x)\n=o, which directly yields 
the following formula relating typical complexity to 0irsb(^, "^): 

s{x) = {d/dx) {x{d/dn)(t)msB{n,x)\n=o) , 



S(s(x)) = -x'^{d'^/dxdn)(j)ijisB{n,x)\ 



n=0- 



(25) 
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Figure 1. Schematic diagram of the IRSB structure of the factor 



On the other hand, an identity with respect to the indicator function J^j^-yi'^) — ^ 
for VS' guarantees ^^^7 = Z{D^), indicating that 0(n) = 4>iRSB{n,x)\x=i holds in 
general. This means that the rate function can be assessed from (pmsBin, m) as follows: 

Stot{n) = (<9/<9n)0iRSB(".,a;)|a;=i, ^^g) 
^(stot(^)) = -n^{d/dn) (^-VirsbI^, a;)U=i) , 

where we define the total entropy Stot = ^vonN-^ooi)-/ N)\ogZ = max^js + which 
corresponds to the total number of weights that are compatible with D'^^ . In f l25l) . the 
parameter x can vary only in such a range that both s{x) > and > hold. 

Similarly, the conditions Stot{n) > and R{stot{n)) < restrict the range of n in (l26ll . 
These constitute the main result of the present paper. 

Here, three issues are noteworthy. First, for a class of disordered systems, including 
random energy models and p-body spin glass models without external fields, two 
equalities 0irsb('^, '"^) = (^/''^)0rs(^) and 0irsb(^, = 1) = 0rs(^); hold in assessing 
the complexity and rate function, respectively, where (pnsin) is an identical RS solution 
of the generating function (^{n). Inserting these functions into fl25l) and fl26l) offers an 
identical functional form for both the complexity and the rate function, while their 
domains of definition are disjointed, except for a point of the typical value of free energy 
/* (or entropy s*). The current system, however, does not possess this property because 
(piRSB{n,m) = (n/m)0Rs(m) does not hold for IRSBl, 1RSB2, or 1RSB3 while 
'PiRSB{n,m = 1) = 0Rs(n) is always satisfied. Second, fl25|) and fl26l) are valid only 
when 0irsb(^, '"^) are stable against any perturbation for a further RSB. Fortunately, 
in the present problem, a stable solution against any known RSB instabilities can be 
constructed for Va > and > 0. This implies that, in the present analysis, there is 
no need to consider further RSB. Finally, however, we have to keep in mind that fl25l) 
and fl26l) depend on the assumptions that correct S(s) and R{s) are convex upward 
functions, respectively. When the convex upward property does not hold, the estimates 
of (125!) and (!26l) represent not the correct solution, but rather its convex hull. The 
following analytical and experimental assessment indicates that this is the case for E(s) 
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of sufficiently low a and R{s) of sufficiently high a. 
4. Theoretical predictions 

We are now ready to use the formalism developed above to analyze the behavior of the 
weight space of the Ising perceptron. 

4.1. Complexity for a < as = 0.833 . . . 

In order to perform the analysis, it is necessary to select a certain solution (functional 
form) from among the three candidates of IRSBl, 1RSB2, and 1RSB3. Analyticity 
and physical plausibihty are two guidelines for this task. 

The replica method is a scheme to infer the properties for real replica numbers 
n G M by analytical continuation from those for natural numbers n = 1,2,... G N. 
This indicates that, for examining typical {n —>■ 0) behavior, it is plausible to select the 
solution of 0(n) that is dominant around n > 1, because unity is the natural number 
that is closest to zero. For a < ag = 0.833 . . ., this solution is (pRsii^). In addition, the 
relevant (f)msB{n,m) must agree with this solution at m = 1. These considerations offer 
two candidates of g{x) as 

d 

^iRSBi(a;) = x— 0iRSBi(^,a;)|„=o = 0rsi(O), (27) 

and 

d 

5'iRSB2(a;) = x— 0iRSB2(^,a;)|„=o = a;0Rsi(O). (28) 

We combine these solutions to construct an entire functional form of g{x) based 
on physical considerations. For x ^ 1, g{x) should vary approximately linearly with 
respect to x, because a single pure state of the largest entropy typically dominates 

Za^. In addition, s{x) = {d /dx)g{x) for x ~ should be smaller than that ioi x^ 1 
because s{x) should increase monotonically with respect to x. Furthermore, g{x) must 
be a continuous function. These considerations reasonably yield an entire functional 
form of g{x) as 

which yields the complexity as 

I —00, otherwise. 

The piecewise linear profile of (1291) is somewhat extraordinary. This is thought to 
be because the correct complexity is not convex upward in this system. When S(s) is 
convex upward, the current formalism using the saddle-point method defines a one-to- 
one map between g{x) and S(s). However, if E(s) is not convex upward, the functional 
profile of a region in which the correct complexity is convex downward is lost and only 
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the convex hull is obtained by the transformation from g (x) , as shown in figure [21 The 
piecewise liner profile of g{x) presumably signals that this actually occurs in the current 
problem. Similar behavior of the complexity could also be observed in a certain type of 
random energy models [21 j. 

The physical implication of ( l30l) . the profile of which is obtained by connecting 
two points (s, S) = (O,0^g]^(O)) and (^rsiIO); 0) with a straight line having a slope 
of —X = —1, is that the weight space is equally dominated by exponentially many 
clusters of vanishing entropy and a subexponential number of large clusters composed 
of exponentially many weights. The existence of large clusters may accord with an 
earlier study which reported that local search heuristics of a certain type manage to 
find a compatible weight efficiently up to a considerably large value of a near to the 
capacity as [22]. On the other hand, the coexisting exponentially many small clusters 
may be a major origin of a known difficulty in finding compatible weights by Monte 
Carlo sampling schemes [231 1211 • 




s 



Figure 2. Schematic profile of complexity ([50)1 . The characteristic exponent of the 
size distribution of pure states cannot be correctly assessed in the current formalism if 
it is a convex downward function (dashed curves). In such cases, the complexity S(s) 
assessed from 17(2;) (solid line) is the convex hull (black circle) of the correct exponent. 



4-2. Rate function for a > Os = 0.833 . . . and a transition at Ogd = 1.245 . . . 

For as < a, f[5U]) becomes negative, which implies that there exist no compatible weights 
for typical samples of D^^ . In such cases, the rate function R{s), which characterizes a 
small probability that atypical samples that are compatible with the Ising perceptrons 
are generated, becomes relevant in the current analysis. Therefore, we focus on the 
assessment of this exponent for this region. 

For as < a < acD = 1.245..., (pRsiin) dominates the generating function (j){n) 
in the vicinity of n > 1 as for a < as. This means that 0irsbi(^, ""^ = 1) = 
(pmsB2{n,m = 1) = 0rsi(^) should be used to assess R{s) of relatively frequent events 
that correspond to < n < 1. However, this function is minimized to a negative value at 
a certain point at which < ns{a) < 1, which implies that assessment by naively using 
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(pRSi{n) for n < ns{a) leads to incorrect results, which yield a negative total entropy 
■5tot(^) = (<9/c?^)0Rs(^) < 0. In order to avoid this inconsistency, we fix the value of 
0(n) to 0Rsi(?T's(«)), which is reduced to the conventional construction of a frozen RSB 
solution. In particular, this yields an assessment of 

-R(O) = 0Rsi('^s(tt)) = min{0Rsi(n)}, (31) 

n 

which has the physical meaning of a characteristic exponent of a small probability that a 
given sample set D^^ is separable by certain Ising perceptrons. For a > ogd = 1-245 . . ., 
on the other hand, the dominant solution of (f){n) in the vicinity of n > 1 is updated 
from 4>Rsi{^) to (pRS2{n) = (1 — a) log 2, which yields 

i?(0) = 0RS2H = (l-«)log2. (32) 

In order to provide a visual representation of the above discussions, we depict the 
behaviors of 0(n) in figure [31 




Figure 3. Behavior of (j){n). The solid lines denote the correct (j){n), and the dotted 
lines are the RS and frozen RSB branches. The corresponding values of the parameter 
a are 0.5, 0.95, and 1.4, from left to right. 



The difference in physical behavior between as < a < ogd and a > ogd is 
expected to be as follows. For < a < acD, the dominant solution around n > rig (a), 
0(n) = 0Rsi(^), varies smoothly. This leads to the following behavior of R{s) in the 
vicinity of s = 0: 

R{s) = R{0) -As^ + ..., (33) 

where A > is a certain constant, which implies that large clusters can appear with 
a relatively large probability although typical samples of are not separable by the 
Ising perceptrons. On the other hand, for a > ogd, (pin) = (pRS2{n) is constant for 
n < nQD^a), which is characterized by 0rs2('^gd(«)) = 0rsi(^gd(«)) and nGD(«) > 1, 
and is switched to (j){n) = 0rsi(^) for n > nGD(«) at n = nQYi^a), which is accompanied 
by a jump in the first derivative. This indicates that (upward) convexity does not hold 
for R{s) in the region of < s < {d / dn)(j)Rsi{^GD{c()) as was mentioned for S(s) in the 
previous subsection, which implies that the events of s = overwhelm those of s > in 
relative probabilities. Therefore, the generation of large clusters should be considerably 
rare for a of this region. 
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4-3. Phase diagram on the n-a plane 

The above considerations are sufficient to draw a phase diagram on the n-a plane, which 
is depicted in figure HI 

n 

1.5 
1.0 
0.5 
0.0 

0.6 0.8 1.0 1.2 1.4 

Figure 4. Phase diagram on the n-a plane. Sohd hnes are phase boundaries, and 
the dotted hne denotes n = 1. The dotted-dashed hne expresses the AT hne for the 
RSI solution, but is irrelevant. The AT line vanishes at a certain value of a, because 
the solution for < g < 1 vanishes at this point. The RS2 solution involves the AT 
instability only on the n — line, which is presumably of no relevance in the replica 
analysis. 

The value of the tricritical point ogd = 1.245 ... is identical to the critical 
ratio of the perfect learning of the Ising perceptrons in the teacher-student scenario 
[25| [26] . Formally, this agreement is explained as follows. The dominant solution for 
n < 1 is determined by whether 0rsi('^) or 0rs2(^) dominates around n > 1. Since 
'pRSiin = 1) = 4>RS2{n = 1) is always guaranteed, the critical condition is given as 
(9/9n)0Rsi(n)|„=i = (i9/9n)0RS2H|n=i = 0. On the other hand, (5/(9ra)0Rsi(ra)|„=i 
generally provides the total entropy after learning in the teacher-student scenario, the 
target of which can be dealt with as an (?t, + l)-replicated system, in which the teacher is 
handled as an extra replica. Therefore, the condition of perfect learning, which indicates 
that the weight of the student agrees perfectly with that of the teacher after learning, 
is identical to the vanishing entropy condition of the {n + l)-replicated system in the 
limit n ^ 0, which agrees with (c)/9n)(/)Rsi(?T,)|„=i = 0, giving the critical value acoltt) 
in the current problem. Although the agreement is justified formally in this manner, its 
physical implication remains somewhat unclear. The line n = 1, which passes through 
the tricritical point, may have an analogous relation to the concept of Nishimori's line 
in the theory of spin glasses [IHl EZ] ■ 

Finally, we mention the de Almeida- Thouless (AT) condition in this model |28j . 
The AT (stability) condition of 0Rs(n) with the order parameters q and g is expressed 
as follows: 

a I D^E- J Dz cosh-^ y^z ^ ^ 

(l-g)2 jDzE- J Dz cosh^J^z " ^' ^ 
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An outline of the derivation is given in [2]. This condition for 0rsi(^) is broken in a 
certain region on the n-a plane, but is irrelevant because the region is always included 
in 72 < ns{a), for which the relevant solution is already switched to that of the frozen 
RSB. On the other hand, 0rs2 is stable for n > but becomes unstable only on n = 0, 
as reported in [H]. The relevance of this instability for a > acD may require more a 
detailed discussion, but we assume herein that this instability can be ignored because 
only the asymptotic behavior of in the limit n — is relevant in procedures of the 
replica method. 

5. Numerical validation 

For validating the theoretical predictions obtained in the previous section, we carried 
out extensive numerical experiments. In describing the experiments, let us first define 
the cluster in the present problem. The cluster is a set of spin configurations that are 
stable with respect to single spin flips [HI [T71 [H] . Clusters have the following properties: 

• Any configuration belongs to a cluster. 

• When a spin configuration "A" can be moved to another configuration "B" by a 
single spin flip without changing the number of incompatible patterns, "A" and 
"B" belong to the same cluster. 

In the following, we concentrate on vanishing energy clusters, which are composed of 
weights that are perfectly compatible with Z)^. 

Before going into details, we elucidate the relation between the cluster and the 
pure state. Identifying the microscopic description of a pure state is generally a dehcate 
problem, but in the Ising perceptron a pure state can be identified with a cluster, 
as mentioned in section [H There is no proof of this statement but it is naturally 
understood by considering the following aspects of the present problem: The Boltzmann 
weight of r]{S\D'^) in (ITTi) becomes completely zero if there is any incompatible 
pattern. This means that accessing from a cluster to a different cluster by single spin 
flips is impossible because those clusters are completely separated by states with zero 
probability ri{S\D^') = 0. This naturally leads to identifying a cluster with a pure state, 
because a pure state is a set of configurations which cannot be accessed from other sets 
by natural dynamics. Several earlier studies support this description [IHl [TTJ [IE] , and 
we hereafter admit this assumption. 

Now, let us return to the experiments. We denote the size of a cluster as Q and the 
number of size-Q clusters for a sample D*^ as C{Q\D^'^). the entropy of a cluster s is 
considered to be identified by s = (1/A^) logQ, and the complexity T,{s\D^) corresponds 
to {1 / N) log C{Q\D^). The clusters can be numerically evaluated, and hence we can 
construct the IRSB generating function from the numerical data as 




(35) 
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where [ 
limit y 



.] denotes the sample average operation with respect to D^^ . 
0, this yields the following expression: 



lim i^^lRSBnumla;?/, x) 

y^o oy 



1 





log 




N 


(E,Q 







In the typical 



(36) 



where the step function Q{x) comes from the differentiation of log y^^Q^j with 
respect to y. This means that if there is no cluster for a sample Z)*^, then the 
contribution of Q^^ log Q^^ vanishes. 

In order to examine the consistency with the replica analysis, we assess f l36|) based 
on data obtained in extensive numerical experiments. The function gnum{x) is evaluated 
by the exact enumeration of weights that are compatible with , which are referred 
to hereinafter as solutions. The procedure is summarized as follows: 

(i) Generate M examples D^^ = {{yi, Xi) ■ ■ ■ {yM, xm)}- 

(ii) Enumerate all solutions. 

(iii) Partition the solutions into clusters, and calculate for an appropriate set of 
X. We actually took 41 points between x = and 2.0. 

(iv) Repeat the above procedures until sufficient data are obtained and calculate 

by taking the sample average. 

{x) for a = 0.5 are shown in figure O As the system size 



(E^ Qi) log (e, Q', 

The resultant plots of 




1.5 2.0 



Figures. Behavior of ijnum (a;) 
for a — 0.5. The system sizes 
are N = 14, 16, 18, 20, 22, and 24, 
from bottom to top. The sohd 
hnes denote g{x) given by ij^ . 
The number of samples is 32, ODD 
for each N . Error bars are 
smaher than the size of markers. 
As the system size grows, the 
profiles of grmin[x) approach the 
theoretical prediction. 




1/N 



Figure 6. Size dependence of 
5'num(0). The data are the same 
as figure El The solid line is 
provided by quadratic fitting to 
the data. The point at l/N = Q 
is the theoretical value derived 
by the replica method. The data 
tend to reach this theoretical 
value as the system size grows. 
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grows, the numerical data for x < 1 exhibit flatter slopes approaching the theoretical 
prediction g{x) = 0rsi(O) for a; < 1. This can also be seen in figure [6] as the 
systematic approaching of (7num(0) to the theoretical value of g{0) = 0.314. . . derived 
from the replica analysis. The difference between the numerical extrapolation and the 
analytical result at = is considered to be the systematic error due to higher order 
contributions of The profiles of x > 1, on the other hand, are approximately 

straight lines, and the slopes appear gentle than that of the theoretical prediction 
0^g^(O). However, the data still slowly move closer to a:0^gi(O) (x > 1) as becomes 
larg whole, implying consistency with the theoretical prediction. 

Complexity S(s) can also be assessed from the numerical data. One scheme for 
evaluating is to use the relations of fl25l) with a polynomial interpolation of the 
numerical data. We determined the order of the polynomial using Akaike's information 
criteria [29] and eventually selected a 27th degree polynomial, but the obtained results 
were not so sensitive to details of the choice of the polynomial. The assessed profiles of 
are plotted in figure [3 The curves appear to approach the line predicted in the 




0.05 0.10 0.15 0.20 0.25 0.30 



Figure 7. I](s(a;)) obtained from (7num(a;) using the relation of (|25|) for a — 0.5. The 
system sizes increase from bottom to top. The solid line denotes the asymptotic line 
in the thermodynamic limit predicted by the replica analysis. 

previous section as increases, which supports our replica analysis. 

However, the complexity curve shown in figure [7| might lose the information about 
the correct distribution of the clusters, as mentioned in section 14. 2[ In order to examine 
this possibility, we directly evaluate the distribution of pure states in a rather naive 
manner. We refer to the result of this assessment as the raw complexity, which is 
defined as 

= (1/iV) \ogQ\D'') = {C{Q\D'')) log {C{Q\D'')) . (37) 

Taking the sample average yields the typical profile of as Sr(s) = [Si.(s|Z}^''^)], 

the result of which for a = 0.5 is shown in figure [H We took 32, 000 samples in the 
evaluation for each size and joined the plots to obtain smooth curves. This figure 
indicates that Sr(0) approaches the value of the theoretical prediction 0^g]^(O)|a=o.5 = 
0.314... from below as increases. However, Si.(x) for x > 0.1 appears to remain 
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0.00 0.05 0.10 0.15 0.20 0.25 0.30 

Figure 8. Plot of the raw 
complexity I]i-(s) for a = 0.5. 
The system size increases from 
= 14 to 24 in increments of 
2, from bottom to top. The solid 
line is the same as that shown in 
figure [T] 




Figure 9. Plot of s + T,r{s)- 
As the system size increases, the 
curve appears to converge to a V- 
shape function, indicating that 
Sr(s) is convex downward. The 
inset shows a close-up of the 
region enclosed by the dotted 
ellipse. 



approximately constant at zero, indicating that Er(x) converges to a convex downward 
function. We also plot the function s-|-Sr(s) in figure [91 This plot shows two peaks and 
one dip of s + Sr(s), indicating that is a convex downward function. The position 

of the right-hand peak tends to move left to the right terminal point s = 0.314 ... of the 
theoretical prediction as the system size increases, while the dip appears to be bounded 
at the point x = 0.084 as shown in the inset. In conclusion, these figures indicate that 
the exponent that characterizes the size distribution of the pure states, Ei.(s), is not a 
convex upward function in this system and does not agree with S(s), which is evaluated 
by the relation of fl2Tl) using g{x). 

Next, we assessed the rate function for the region of a > Og- In this region, the 
generation of samples that are perfectly compatible with the Ising perceptrons rarely 
occurs and is dominated by s = 0. Therefore, we numerically evaluated the probability 
that a given set of samples D'^^ could be separated by the Ising perceptron, Psep, and 
estimated R{0) as R{0) = (l/N) logPsep- 

The resultant plots are given in figures [TO] and [TT] for a = 1.0 and 1.5, respectively. 
The solid lines in these figures were obtained by the linear fitting for the numerical 
data. These figures show that the theoretical predictions are reasonably consistent with 
the values of extrapolation of the numerical data. The statistical errors are sufficiently 
small, and hence the differences between the analytical and numerical results should be 
the systematic errors due to the nonlinearity of the Ising perceptron. 
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Figure 10. Size dependence 
of the rate function for a = 
1.0. The point at l/A^ = is 
the value predicted by the frozen 
RSB solution. The system size 
increases from = 12 to 18 in 
increments of 1. The data from 
320,000 samples were evaluated 
for each N . The statistical errors 
are smaller than the markers. 



Figure 11. Size dependence 
of the rate function for a = 
1.5. The upper and lower plots 
at 1/A^ = are given by the 
RS2 and frozen RSB solutions, 
respectively. The system size 
increases from N — 10 to 18 
in increments of 2. The data 
from 25,600,000 samples were 
evaluated for each N . 



6. Summary 

In the present paper, we investigated the structure of the weight space of Ising 
perceptrons in which a set of random patterns is stored using the derivatives of the 
generating function of the partition function. This was achieved by carrying out a finite- 
n rephca analysis under the assumption of one-step rephca symmetry breaking (IRSB) 
handhng Parisi's IRSB parameter as a control parameter. For a < = 0.833 . . ., the 
analysis of n — indicates that the characteristic exponent of the size distribution 
of pure states is not convex upward, which implies that the weight space is equally 
dominated by a single large cluster of exponentially many weights and exponentially 
many clusters of a single weight. For a > as, a set of random patterns is rarely 
compatible with the Ising perceptron. The n — analysis enables us to assess the 
rate function that characterizes a small probability that a cluster of a given entropy will 
emerge after the storage of random patterns. We found that a cluster of finite entropy 
is generated with a relatively high probability for as < a < q;gd = 1.245 . . ., but this is 
very rare for a > ogd- These theoretical predictions have been validated by extensive 
numerical experiments. We also drew a complete phase diagram on the n-a plane, in 
which (n, a) = (1, ogd) becomes a tricritical point. The line n = 1 that passes through 
the tricritical point is analogous to the Nishimori line in the theory of spin glasses. 

We stressed the use of the replica method as a tool for calculating the complexity 
and rate function. The developed formalism enables the extraction of useful information 
about typical and atypical behaviors of the objective system from a single generating 
function in an unified manner. It is hoped that the results of the present study will help 
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to clarify systems with complex phase spaces as well as the replica method itself. 
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Appendix A. Derivation of RS and IRSB solutions 



For n = l,2,...,GN, the nth moment of Z is expressed as 

" n M 

innM = E 

s'^,s^,...,s- 



N 



(A.l) 



,a=l 11=1 . - ■ / J ^j^j 

where the brackets [■ ■ ■]£)Ai denote the average over the quenched randomness D'^ . The 
variable = —y^S"- ■ x^j \/N (a = 1, 2, . . . , n; /x = 1,2,..., M) can be regarded as 
multivariate Gaussian random variable, which is characterized as 

[<] DM = 0' [«] DM = {^ab + (1 - 5a6)g"') , (A.2) 

where g"* = (1/A^) X]i=i {ci,b = 1,2, ... ,n). This observation yields the following 
expression 

n M 



SI, S2....,S" a<h 



SI, 



nn^K) 



a=l /i=l 



DM 



<6' 



+ a log 



ne 



a=l 



(A.3) 



where [■ ■ -j^ denotes the average with respect to the multivariate Gaussian variables the 
moments of which are given by (lA.2p . In order to derive the previous expression, we 
used the Fourier expression of the delta function 



S {S^'S^ - Nq^'') = ^ exp {q^\S''S^ - Nq''^)) . 

J~ioo 



(A.4) 



Applying the saddle-point method to ( 1A.3I) . we immediately obtain (fT2l) . In order to 
investigate 0(n) for n G M, we need an ansatz on the form of the saddle point q"-'^. We 
first adopt the RS ansatz 

= q{a<b=l,2,...,n). (A. 5) 
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Under this assumption, we obtain 

^g«V^ = ln(n-l)gg, (A.6) 



2 

a<b 



e^^<b^'^^^' =6--^""^ j Dz{2cosh^zY. (A.7) 

51,52^. ..,5" 

Under the RS ansatz, the Gaussian variable can be decomposed to two independent 
Gaussian variables of zero mean and unit variance x"" and z as 



= VI - gx" + ^z. (Ai 
Using this expression, we obtain 



ne 



11°) 



a=l 

+00 



Dz[E[J-^z]] , (A.9) 



l-q 

where E{u) = J^°° Dz. Using the above expressions, we obtain 

f 1 ^ 1 ^ f 

0Rs(^) = Extr< — n{n — l)qq nq + log / D z{2 cosh, ^z)"^ 

q,q y 2 2 J 

+ a\og j Dz [e (^^I^^z^^ |, (A.IO) 

The saddle point conditions are 

f D 2; cosh" -v/g^tanh^ \/qz 

q = ^ : 

cosh" ^/qz 



(A.ll) 



a 



~ 1 r n Z7« ' (A. 12) 

I — q J Dzh^ 

where E'{x) = dE{x)/dx = —e~^'^^'^/y/2Ti. Note that the arguments of E and E' are 
^yq/JY^^^z. As previously noted, there are two solutions to ( lA.lip and ( 1A.12I) . i.e., 
the RSI and RS2 solutions presented in section [3.1.11 

Next, we use the IRSB ansatz. The replica indices are divided into n/m groups of 
identical size m, and g"* and g"'' are parameterized as 

/ afe ^feN ^ \ {(luQi) { a and b belong to the same group ) 
\ (90, go) ( otherwise ) 

This assumption yields 

q'^^q^^ = ]^n{m - l)gigi + ]^n{n - m)%qQ, (A.14) 

a<b 

Y e^'x''^"'^"^' =e-5"fi /"d;zo ( /^^i(2cosh/i)™V , (A.15) 

where h = \/qi — go-^i + ^/oo^o- The Gaussian variable can be decomposed to obtain 
u" = ^/l^^y^a + Vqi^^x^ + ^/%z, (A. 16) 
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where x^r, Vaa and z are independent Gaussian variables of zero mean and unit variance. 
The index a indicates a block and a pair of a and a specifies a replica in the a block. 
This transformation yields the following expression: 

n/m 



ne 

a=l 



U 



Dz (^j Dx{E{y,{z,x))Y 



where 

Vq^z.x) = - 
Finally, we obtain 

0irsb(^,"^) 
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1-qi 



Extr^ 
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l-qi 
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--n[m - Ijgigi - -n[n - m)qoqo 



— -ragi + log / Dzq [ I Dzi{2cosh.hy 



+ a\og / Dzo[ / Dzi{E{yo{zo,zi))y 



n/m 



n/m 



(A.17) 



(A. 18) 



(A.19) 



Taking the extremization, we can derive the saddle-point equations. The possible 
solutions of the equations are discussed in section I3.1.2[ 
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